Instructing a Reinforcement Learner
نویسندگان
چکیده
In reinforcement learning (RL), rewards have been considered the most important feedback in understanding the environment. However, recently there have been interesting forays into other modes such as using sporadic supervisory inputs. This brings into the learning process richer information about the world of interest. In this paper, we model these supervisory inputs as specific types of instructions that provide information in the form of an expert’s control decision and certain structural regularities in the state space. We further provide a mathematical formulation for the same and propose a framework to incorporate them into the learning process.
منابع مشابه
Instructing a Reinforcement Learner
In reinforcement learning (RL), rewards have been considered the most important channel for understanding an environment’s dynamics and have been very effectively used as a feedback mechanism. However, recently there have been interesting forays into other modes of understanding the environment. Using sporadic supervisory inputs is one such alternative. This brings into the learning process ric...
متن کاملGuiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer
We describe our current efforts towards creating a reinforcement learner that learns both from reinforcements provided by its environment and from human-generated advice. Our research involves two complementary components: (a) mapping advice expressed in English to a formal advice language and (b) using advice expressed in a formal notation in a reinforcement learner. We use a subtask of the ch...
متن کاملIntroducing interactive help for reinforcement learners
The reinforcement learning problem is a very difficult problem when considering real-size applications. To solve it, we think that many issues should be studied altogether. To achieve such an endeavor, we also think that it is quite common that human begins can provide help on-the-fly to the reinforcement learner, that is when he/she sees how the learner is (mis)behaving, or could perform bette...
متن کاملInverse Reinforcement Learning Under Noisy Observations (Extended Abstract)
We consider the problem of performing inverse reinforcement learning when the trajectory of the expert is not perfectly observed by the learner. Instead, noisy observations of the trajectory are available. We generalize the previous method of expectation-maximization for inverse reinforcement learning, which allows the trajectory of the expert to be partially hidden from the learner, to incorpo...
متن کاملAgnostic KWIK learning and efficient approximate reinforcement learning
A popular approach in reinforcement learning is to use a model-based algorithm, i.e., an algorithm that utilizes a model learner to learn an approximate model to the environment. It has been shown that such a model-based learner is efficient if the model learner is efficient in the so-called “knows what it knows” (KWIK) framework. A major limitation of the standard KWIK framework is that, by it...
متن کامل